Data Feminism as a Challenge for Digital Humanities?

During the annual conference of the DHd Association, the Empowerment Working Group organized a workshop on the topic of Data Feminism in the Digital Humanities (organized by Luise Borek, Nora Probst & Sarah Lang, technical support: Yael Lämmerhirt)[1]. This short blog post aims to present preliminary results to document the event and raise awareness for this essential topic. Everyone is invited to participate in the project and should contact the Empowerment Working Group if interested.

Citation suggestion: Luise Borek*, Elena Suárez Cronauer, Pauline Junginger, Sarah Lang, Karoline Lemke & Nora Probst, Data Feminism as a Challenge for Digital Humanities? [English version], in LaTeX Ninja Blog, 01.07.2023. https://latex-ninja.com/2023/07/01/data-feminism-as-a-challenge-for-digital-humanities/ *All authors contributed equally.

Disclaimer: This is a machine-translated version of the original German article (found here), powered by ChatGPT 4. I read over it to make sure there’s nothing wildly inappropriate in there but since terms used are crucial when it comes to this topic, the German version is the one we ultimately stand behind. This direct translation from German loses some nuance and is less concise than the original (sorry!). For your information: The DHd (Digital Humanities in the German-speaking area) working group with the cheesy name “Empowerment” (we would have called it “diversity” but we’re really not diverse enough, so empowerment it is) was founded as a consequence of this blog post on here: The Computational Humanities and Toxic Masculinity? A (long) reflection So it seemed right to repost an English version of our working paper/blog post here. Back then, I was still a little bit afraid of the reactions my blog post might entail (you can read about it in the disclaimer part of the blog post). Today, I’m not afraid anymore and proud of our working group – join us!

Introduction

Data Feminism, based on the book of the same name by Catherine D’Ignazio and Lauren F. Klein (MIT Press 2020), is a movement within intersectional feminism that questions white, cis-male dominated narratives and structures in data science. This not only applies to data analysis but also to the modeling, collection, curation, and presentation of datasets. Data feminist approaches are of significant importance and offer exciting research desiderata for the Digital Humanities; however, they have not yet been firmly established within the German-speaking DH community. Especially lacking are concrete guidelines and frameworks on how to integrate data feminist approaches into the everyday project practices of DH.

In addition to their data and technical competence, the digital humanities are characterized by the influence of different disciplinary traditions and discipline-specific methods. However, all of these disciplinary traditions share the fact that they are shaped by hegemonic norms, models of thinking, classifications, institutions, and structures that have emerged from patriarchal, colonial, racist, capitalist, and other systemic hegemonic power relations. Due to their object-oriented research practices in dealing with material, visual, and textual sources of cultural heritage, the Digital Humanities face the risk of transferring hegemonic patterns of thought found in these sources into data-driven research discourses without sufficient critical reflection on power dynamics. It is not only about designing new epistemological frameworks for the datafication and encoding of cultural heritage that prevent the transmission of existing forms of discrimination into the digital realm. A critical reflection on power dynamics also pertains to the accumulation of sources and data itself. Collection practices often rely on colonialist, Eurocentric, cis-male perspectives, which raises the question of how the Digital Humanities intend to address the gaps in the archives that hinder or completely prevent the exploration of perspectives beyond cis-male, white narratives.

This historically rooted imbalance manifests itself in the fact that data of marginalized groups are often not preserved. If they are preserved, it is usually through the perception of hegemonic groups. Thus, archives simultaneously produce (in)visibilities and (in)expressabilities.

Women and other groups that exist beyond cis-male identities are particularly affected by this. Within collections of cultural heritage, gender is usually a category that represents a later ascription rather than self-identification, often due to historical circumstances. Therefore, a gender-sensitive handling of this data starts with its modeling. Gender-sensitive representation is just one important aspect among many that concern digital humanists at large, yet it has not received the necessary attention and careful reflection. The classification of gender and its digital representation are always ethical decisions. However, adapting historical classification systems afterward is highly complex and poses a series of challenges that have not only informational and scientific implications but also interdisciplinary and historical implications. Therefore, besides ethical and political aspects, criteria need to be formulated within the scientific context on how to address existing imbalances in the context of quantitative methods. The distortions created by the so-called “gender data gap” can affect the validity and interpretability of scientific results.

This is where our workshop comes in. Together with the participants, we aimed to create a first draft for a guide that provides interested individuals with concise fundamental information and assistance regarding the following questions:

  • What is Data Feminism, and to what extent does the term find consideration in Digital Humanities discourses?
  • Why is Data Feminism needed in DH, and to what extent should its approaches be considered within the existing DH discourses?
  • What can Data Feminism look like in the Digital Humanities, and what should it encompass?
  • What is meant by the term Data Feminism?
  • In which types of projects or research questions should researchers feel compelled to meet the demands of Data Feminism? And if so, are there simple approaches that can be considered and implemented right from the start of a project?
  • How can data feminist projects be implemented in the DH? To what extent can research fields be defined that serve as a good starting point for interested individuals to immediately and practically implement some approaches of Data Feminism?

During the workshop, four problem areas were discussed, and their central premises will be briefly outlined here (for more details see the abstract of the original workshop). It quickly became apparent in the discussion that the terminologies we work with constitute a separate area of work that will also be considered here.

Terms

Throughout the workshop, we repeatedly encountered the issue of the dynamic nature of terms and conventions in the context of feminism and gender-inclusive language. It is essential to use terms reflectively and provide justification for their use in digitally provided and processed historical sources, as well as in accompanying texts. Additionally, one’s own position in society should be made transparent and reflected as part of scientific work. It is crucial for us to use inclusive terms such as “FLINTA” [2] to distance ourselves from reactionary streams of feminism. At the same time, due to the lack of diversity among workshop participants, we can only provide a perspective of critical whiteness concerning intersectional forms of discrimination.

Similarly, terms such as sex, gender, and gender identity, as well as their relationships to each other, need to be critically examined. While the German term for biological sex (Geschlecht) can be translated directly as sex in English without loss of meaning or nuance, gender is different. Adequately translating the term is problematic as it encompasses the concepts of gender identity, social gender, and the relationship between sex and gender roles. Therefore, we adhered to the English term gender, even in German, defining it as culturally, socially, and historically conditioned identity concepts attributed to the categories of feminine and masculine [3]. Gender, as a historical social category, plays a crucial role in the handling of data within the context of Data Feminism.

Data Gender Gap and Cultural Heritage

Given the mandate of memory institutions to collect data, strategies and measures need to be developed to reduce the existing Gender Data Gap. Particularly in the context of the growing number of digital archives dedicated to preserving cultural heritage, existing marginalizations should not be reproduced but rather new problem-oriented strategies should be developed, both for contemporary and historical data. The collection of norm data, which systematically record individuals who publish, represents a fundamental problem in this regard. Norm data records often have biased characteristics due to unexplained and inadequate data collection and maintenance. It is questionable whether this type of collection can adequately represent the complexity of gender. To expand the possibilities of gender modeling, we must move away from the concept of gender as a binary, fixed category and incorporate the concept of gender more strongly. One approach could be the use of critically annotated multiple data, including references to temporary aspects with weighting or qualification.

Modeling, Curation, Data, and Corpus Criticism

Data models are to be understood as temporally, spatially, and situationally fixed snapshots of reality, which necessarily involve the reduction of complexity. When reusing a data model or the data based on it, the context in which the model was created, including the underlying knowledge traditions, must be considered. The model should be questioned regarding who modeled what for what purpose and what is thereby made visible or invisible. When modeling identity categories, it should be considered that visibility poses a potential danger for marginalized individuals. D’Ignazio and Klein refer to this as the “paradox of exposure” (2020, 105). Several widely used data models, such as the PICA acquisition schema of the GND, TEI-XML, or data modeling in Wikidata, pose challenges for modeling gender. Often, only a binary assignment of gender is possible without distinguishing between sex and gender or without fluid assignment of gender based on temporary aspects. Similarly, uncertainties regarding the external and self-designation of gender are not modeled. However, there are an increasing number of examples that strive for a more sensitive representation of various identity categories [4]. It would be desirable for the community to derive lessons learned and best practices from these examples. Our working group aims to contribute to this with our future work.

Machine Learning as Bias Amplifier or Opportunity?

Machine learning needs to be considered in a differentiated manner in the context of Data Feminism and the associated considerations. To prevent machine learning from acting as a bias amplifier, two aspects need to be emphasized. Firstly, the data used as a basis for machine learning tools must be transparent and verifiable. This includes issues of personal rights and copyright (e.g., when using large amounts of image data) as well as ethical data collection (e.g., filtering large amounts of text under precarious working conditions). Secondly, we need to critically question our understanding of the tools, software, and algorithms we use to conduct our analyses: Do we unknowingly amplify existing biases when using them? Can we ensure a critical discussion of methods and algorithms that guarantees the reproducibility of our research? When considering the reuse of such technologies, their development history should also be discussed: Can we gain insight into the working groups that developed the tools, software, or algorithms? What risks do personal biases of such groups pose, especially when they are not diverse?

Conclusion and Future Work

To address the challenges outlined here that arise from engaging with Data Feminism, digital humanities have the task and opportunity to create structures to counter biases and imbalances critically. We can already learn from existing experiences and perspectives from various specific research projects, which can open up spaces for dialogue that will benefit the digital humanities community. We advocate for the development of strategies and measures to make the Data Gender Gap more visible and, where possible, reduce it, both concerning present-day and historical data. Instead of labeling gender as a binary, fixed category, there is a need for more modeling options, such as critically annotated multiple data, potentially with weighting or qualification, and ideally with critical commentary involving gender and temporary aspects. The classification of gender and its digital representation always involves ethical decisions. In addition to ethical and political aspects, it is necessary to strive for scientific criteria to address the Data Gender Gap, as the resulting distortions affect the validity of scientific results. This also applies to the reuse of data models and machine learning tools. We call for a differentiated questioning and examination of these, as well as a critical discussion of methods and algorithms, in order to counter unreflective and unconsciously adopted modeling structures, such as inadequate or missing descriptions of identity categories.

The development and discussion of these strategies and measures can only be briefly outlined here. The productive exchange during the workshop at the DHd conference not only revealed the great need and existing interest in the topic but also brought together concrete experiences from research projects and existing expertise.

Moving forward, we aim to contribute to further anchoring the topic within digital humanities and enabling corresponding research questions. Accordingly, we plan to create a guide on Data Feminism in Digital Humanities. This guide will provide fundamental information on practical data feminist work in digital humanities to facilitate an uncomplicated and accessible entry into the topic and provide a comprehensive overview. It will also develop concrete strategies for implementing the demands of Data Feminism in the context of digital humanities and provide best practice examples as guidance for research projects and researchers. An annotated bibliography on the topic has already been established and will be continuously updated.

The Empowerment Working Group has established a subgroup on Data Feminism and warmly invites all interested parties to participate in this and all other activities!

For more information, please refer to our websites and their contact sections:

https://dig-hum.de/ag-empowerment

https://empowerdh.github.io/

Footnotes

[1] The workshop’s abstract and a selected bibliography can be found here: Lang, Sarah, Borek, Luise, & Probst, Nora. (2023, March 10). Lang, Sarah, Borek, Luise, & Probst, Nora. (2023, March 10). Data Feminism in DH: Hackathon und Netzwerktreffen. DHd 2023 Open Humanities Open Culture. 9. Tagung des Verbands “Digital Humanities im deutschsprachigen Raum” (DHd 2023), Trier, Luxemburg. https://doi.org/10.5281/zenodo.7715422 . Another contribution by the Empowerment Working Group at DHd 2023 is: Tessa Gengnagel, Open DH? Mapping Blind Spots (DHd2023 Report),  in DHd Blog, 31. März 2023, https://dhd-blog.org/?p=19235  and Gengnagel, Tessa, Lang, Sarah, Probst, Nora, Gerber, Anja, Dang, Sarah-Mai, Duan, Tinghui, Grallert , Till, Keck, Jana, & Nyhan, Julianne. (2023, March 10). Open DH? Mapping Blind Spots. DHd 2023 Open Humanities Open Culture. 9. Tagung des Verbands “Digital Humanities im deutschsprachigen Raum” (DHd 2023), Trier, Luxemburg. https://doi.org/10.5281/zenodo.7715329 .

[2] The term FLINTA (in German) stands for Women, Lesbians, Intersex individuals, Non-binary individuals, Trans individuals, and Agender individuals. See: https://queer-lexikon.net/2020/05/30/flint/

[3] See: https://queer-lexikon.net/2017/06/15/gender/

[4] Differentiated role models in digital projects can be found, for example, at: https://credit.niso.org/. There are also several works on the topic, such as Modelling Gender Diversity – Research Data Representation Beyond the Binary (https://dh-abstracts.library.virginia.edu/works/11783), the Orlando Project (based on CWRC Ontology: https://sparql.cwrc.ca/ontologies/cwrc-preamble-EN.html, where gender is modeled as an event), Women Writers Online, which supports the Text Encoding Initiative but also already includes the division into <sex/> and <gender/> (see Beshero-Bondar, Elisa, Viglianti, Raffaele, Bermúdez Sabel, Helena, & Jenstad, Janelle. (2022, September 18). Revising Sex and Gender in the TEI Guidelines. Zenodo. https://doi.org/10.5281/zenodo.7091048), https://homosaurus.org/, https://www.wikidata.org/wiki/Wikidata:WikiProject_LGBT, Ric-O, an archival indexing model that does not consider gender at all but only the relationship to the object and other data: https://www.ica.org/standards/RiC/ontology#Person.

Buy me coffee!

If my content has helped you, donate 3€ to buy me coffee. Thanks a lot, I appreciate it!

€3.00

I like LaTeX, the Humanities and the Digital Humanities. Here I post tutorials and other adventures.

One thought on “Data Feminism as a Challenge for Digital Humanities?

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.